a day ago
Over-hyped AI will have to work a lot harder before it takes your job
Is the secret of artificial intelligence that we have to kid ourselves, like an audience at a magic show?
Some fascinating new research suggests that self-deception plays a key role in whether AI is perceived to be a success or a dud.
In a randomised controlled trial – the first of its kind – experienced computer programmers could use AI tools to help them write code. What the trial revealed was a vast amount of self-deception.
'The results surprised us,' research lab METR reported. 'Developers thought they were 20pc faster with AI tools, but they were actually 19pc slower when they had access to AI than when they didn't.'
In reality, using AI made them less productive: they were wasting more time than they had gained. But what is so interesting is how they swore blind that the opposite was true.
If you think AI is helping you in your job, perhaps it's because you want to believe that it works.
Since OpenAI's ChatGPT was thrown open to the general public in late 2022, pundits have been forecasting huge productivity gains from deploying AI. They hope that it will supercharge growth and boost GDP. This has become the default opinion in high-status policy circles.
But all this techno-optimism is founded on delusion. The 'lived experience' of using real tools in the real world paints a very different picture.
The past few days have felt like a turning point, as the reluctance of pointing out the emperor's new clothes diminishes.
'I build AI agents for a living, it's what I do for my clients,' wrote one Reddit user. 'The gap between the hype and what's actually happening on the ground is turning into a canyon'
AI isn't reliable enough to do the job promised. According to an IBM survey of 2,000 chief executives, three out of four AI projects have failed to show a return on investment, which is a remarkably high failure rate.
Don't hold your breath for a white-collar automation revolution either: AI agents fail to complete the job successfully about 65 to 70pc of the time, according to a study by Carnegie Mellon University and Salesforce.
The analyst firm Gartner Group has concluded that 'current models do not have the maturity and agency to autonomously achieve complex business goals or follow nuanced instructions over time.' Gartner's head of AI research Erick Brethenoux says: 'AI is not doing its job today and should leave us alone'.
It's no wonder that companies such as Klarna, which laid off staff in 2023 confidently declaring that AI could do their jobs, are hiring humans again.
This is extraordinary, and we can only have reached this point because of a historic self-delusion. People will even pledge their faith to AI working well despite their own subjective experience to the contrary, the AI critic Professor Gary Marcus noted last week.
'Recognising that it sucks in your own speciality, but imagining that it is somehow fabulous in domains you are less familiar with', is something he calls 'ChatGPT blindness'.
Much of the news is misleading. Firms are simply using AI as an excuse for retrenchment. Cost reduction is the big story in business at the moment.
Globally, President Trump's erratic behaviour has induced caution, while in the UK, business confidence is at 'historically depressed levels', according to the Institute of Directors, reeling from Reeves's autumn taxes. Attributing those lay-offs to technology is simply clever PR, and helps boost the share price.
So why does the faith in AI remain so strong?
The dubious hype doesn't help. Every few weeks a new AI model appears, and smashes industry benchmarks. xAI's Grok 4 did just that last week. But these are deceptive and simply provide more confirmation bias.
'Every single one of them has been wide of that mark. And not one has resolved hallucinations, alignment issues or boneheaded errors,' says Marcus.
Not only is generative AI unreliable, but it can't reason, as a recent demonstration showed: OpenAI's latest ChatGPT4o model was beaten by an 8-bit Atari home games console made in 1977.
'Reality is the ultimate benchmark for AI,' explained Chomba Bupe, a Zambian AI developer, last week. 'You not going to declare that you have built intelligence by beating toy benchmarks … What's the point of getting say 90pc on some physics benchmarks yet be unable to do any real physics?' he asked.
Then there are thousands of what I call 'wowslop' accounts – social media feeds that declare amazement at breakthroughs. As well as the vendors, a lot of shadowy influence money is being spent on maintaining the hype.
This is not to say there aren't uses for generative AI: Anthropic has hit $4bn (£3bn) in annual revenue. For some niches, like language translation and prototyping, it's here to stay. Before it went mad last week, X's Grok was great at adding valuable context.
But even if AI 'discovers' new materials or medicines tomorrow, that won't compensate for the trillion dollars that Goldman Sachs estimates business has already wasted on this generation of dud AI.
That's capital that could have been invested far more usefully. Rather than an engine of progress, poor AI could be the opposite.
METR added an amusing footnote to their study. The researchers used one other control group in its productivity experiment, and this group made the worst, over-optimistic estimates of all. They were economists.